WikiDes: A Wikipedia-based dataset for generating short descriptions from paragraphs
نویسندگان
چکیده
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, summarization. In this paper, we introduce WikiDes, a novel dataset generate short descriptions articles for the problem The consists over 80k English samples on 6987 topics. We set up two-phase summarization method — description generation (Phase I) candidate ranking II) strong approach that relies transfer contrastive learning. For generation, T5 BART show their superiority compared other small-scale pre-trained models. By applying learning diverse input from beam search, metric fusion-based models outperform direct significantly ≈ 22 ROUGE in topic-exclusive split topic-independent split. Furthermore, outcome Phase II supported by human evaluation 45.33% chosen 23.66% I against gold descriptions. aspect sentiment analysis, generated cannot effectively capture all polarities paragraphs while doing task better automatic new reduces efforts creating them enriches Wikidata-based graphs. Our paper shows practical impact since there thousands missing Finally, expect WikiDes be useful related works capturing salient paragraphs. curated is publicly available at: https://github.com/declare-lab/WikiDes . • Wikipedia- Two-phase – ranking. Sentiment consistency analysis between paragraph description.
منابع مشابه
Generating Coherent Argumentative Paragraphs
Q Should I take AI this semester? We address the problem of generating a coherent A If you want to take courses like paragraph presenting arguments for a conclusion in a Natural Language Processing or text generation system. Existing text planning techExpert Systems or Vision niques are not appropriate for this task for two main next semester, reasons: they do not explain how arguments can be i...
متن کاملGenerating Coherent Paragraphs
Recent years have seen something of a divorce between models of computer-aided instruction and theories of second language learning. For example, it has come to be recognized that a satisfactory model of language teaching must incorporate the notion of 'communicative compe-tence', that is, not only the production of grammatically correct utterances, but also the appropriate use of utterances ac...
متن کاملA Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs
We introduce a manually-created, multireference dataset for abstractive sentence and short paragraph compression. First, we examine the impact of singleand multi-sentence level editing operations on human compression quality as found in this corpus. We observe that substitution and rephrasing operations are more meaning preserving than other operations, and that compressing in context improves ...
متن کاملGenerating Educational Tourism Narratives from Wikipedia
We present a narrative theory-based approach to data mining that generates cohesive stories from a Wikipedia corpus. This approach is based on a data mining-friendly view of narrative derived from narratology, and uses a prototype mining algorithm that implements this view. Our initial test case and focus is that of field-based educational tour narrative generation, for which we have successful...
متن کاملGenerating Auction Conngurations from Declarative Contract Descriptions
This work presents an approach to automating the negotiation of business contracts and describes an implementation of a subset of this overall goal. To support automated contract negotiation, we are developing a language for both (1.) fully-speciied, executable contracts and (2.) partially-specied contracts that are in the midst of being negotiated, speciically via automated auctions. The langu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Fusion
سال: 2023
ISSN: ['1566-2535', '1872-6305']
DOI: https://doi.org/10.1016/j.inffus.2022.09.022